Automatically Categorizing Written Texts by Author Gender

نویسندگان

  • Moshe Koppel
  • Shlomo Argamon
  • Anat Rachel Shimoni
چکیده

The problem of automatically determining the gender of a document's author would appear to be a more subtle problem than those of categorization by topic or authorship attribution. Nevertheless, it is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80% accuracy. The same techniques can be used to determine if a document is fiction or non-fiction with approximately 98% accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gender Identification in Russian Texts

Gender Identification is a task where we have to identify the gender of the author for written texts. An hybrid approach has been designed by combining deep neural network and a rule-based classifier for russian texts. LSTM and BiLSTM have been used as a part of Neural Network due to their capability to learn long-term dependencies.

متن کامل

Automatic Detection of Gender and Number Agreement Errors in Spanish Texts Written by Japanese Learners

This paper describes the creation of a grammar to automatically detect agreement errors (gender and number) in Spanish texts written by Japanese learners. The grammar has been written using the Constraint Grammar formalism (Karlsson et al., 1995), and uses as input the morphosyntactic analysis provided by the Spanish parser HISPAL (Bick, 2006). For developing and testing the grammar, a learner ...

متن کامل

Categorizing spelling errors to assess L2 writing

Based on a corpus of 223 argumentative essays written by English as a foreign language learners, this study shows that spelling errors, whether detected manually or automatically, are a reliable predictor of the quality of L2 texts and that reliability is further improved by subcategorizing errors. However the benefit derived from subcategorization is much lower in the case of errors automatica...

متن کامل

Using LSA to Automatically Identify Givenness and Newness of Noun Phrases in Written Discourse

Identifying given and new information within a text has long been addressed as a research issue. However, there has previously been no accurate computational method for assessing the degree to which constituents in a text contain given versus new information. This study develops a method for automatically categorizing noun phrases into one of three categories of givenness/newness, using the tax...

متن کامل

A Comparative Study of Metadiscourse in Academic Writing: Male vs. Female Authors of Research Articles in Applied Linguistics

Like conversation and other modes of communication, writing is a rich medium for gender performance. In fact, writing functions to construct the disciplines as well as the gender of its practitioners. Despite the significance of author gender, as one constitutive dimension of any writing, it has been relatively under-researched. One way, by means of which author gender is practiced, and reveale...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • LLC

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2002